{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "whjsJasuhstV"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_05_2_parsers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "euOZxlIMhstX"
   },
   "source": [
    "# T81-559: Applications of Generative Artificial Intelligence\n",
    "**Module 5: LangChain: Data Extraction**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d4Yov72PhstY"
   },
   "source": [
    "# Module 5 Material\n",
    "\n",
    "* Part 5.1: Structured Output Parser [[Video]](https://www.youtube.com/watch?v=62CSR141VRE) [[Notebook]](t81_559_class_05_1_langchain_data.ipynb)\n",
    "* **Part 5.2: Other Parsers (CSV, JSON, Pandas, Datetime)** [[Video]](https://www.youtube.com/watch?v=VXm8gPzU3qc) [[Notebook]](t81_559_class_05_2_parsers.ipynb)\n",
    "* Part 5.3: Pydantic parser [[Video]](https://www.youtube.com/watch?v=dc4fn-W60hg) [[Notebook]](t81_559_class_05_3_pydantic.ipynb)\n",
    "* Part 5.4: Custom Output Parser [[Video]](https://www.youtube.com/watch?v=jBpkAblQC_U) [[Notebook]](t81_559_class_05_4_custom_parsers.ipynb)\n",
    "* Part 5.5: Output-Fixing Parser [[Video]](https://www.youtube.com/watch?v=_txWiLjf4bo) [[Notebook]](t81_559_class_05_5_output_fixing_parsers.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AcAUP0c3hstY"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running and maps Google Drive if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xsI496h5hstZ",
    "outputId": "27793b97-f71a-4955-829c-f44ea059703b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: using Google CoLab\n",
      "Collecting langchain\n",
      "  Downloading langchain-0.1.17-py3-none-any.whl (867 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m867.6/867.6 kB\u001b[0m \u001b[31m3.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting langchain_openai\n",
      "  Downloading langchain_openai-0.1.5-py3-none-any.whl (34 kB)\n",
      "Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (6.0.1)\n",
      "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.29)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (3.9.5)\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (4.0.3)\n",
      "Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)\n",
      "  Downloading dataclasses_json-0.6.5-py3-none-any.whl (28 kB)\n",
      "Collecting jsonpatch<2.0,>=1.33 (from langchain)\n",
      "  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)\n",
      "Collecting langchain-community<0.1,>=0.0.36 (from langchain)\n",
      "  Downloading langchain_community-0.0.36-py3-none-any.whl (2.0 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting langchain-core<0.2.0,>=0.1.48 (from langchain)\n",
      "  Downloading langchain_core-0.1.50-py3-none-any.whl (302 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m302.8/302.8 kB\u001b[0m \u001b[31m8.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting langchain-text-splitters<0.1,>=0.0.1 (from langchain)\n",
      "  Downloading langchain_text_splitters-0.0.1-py3-none-any.whl (21 kB)\n",
      "Collecting langsmith<0.2.0,>=0.1.17 (from langchain)\n",
      "  Downloading langsmith-0.1.53-py3-none-any.whl (116 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.4/116.4 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.25.2)\n",
      "Requirement already satisfied: pydantic<3,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.7.1)\n",
      "Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.31.0)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (8.2.3)\n",
      "Collecting openai<2.0.0,>=1.10.0 (from langchain_openai)\n",
      "  Downloading openai-1.25.1-py3-none-any.whl (312 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m312.9/312.9 kB\u001b[0m \u001b[31m9.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting tiktoken<1,>=0.5.2 (from langchain_openai)\n",
      "  Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m17.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.4)\n",
      "Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain)\n",
      "  Downloading marshmallow-3.21.2-py3-none-any.whl (49 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.3/49.3 kB\u001b[0m \u001b[31m1.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain)\n",
      "  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n",
      "Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain)\n",
      "  Downloading jsonpointer-2.4-py2.py3-none-any.whl (7.8 kB)\n",
      "Collecting packaging<24.0,>=23.2 (from langchain-core<0.2.0,>=0.1.48->langchain)\n",
      "  Downloading packaging-23.2-py3-none-any.whl (53 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.0/53.0 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.17->langchain)\n",
      "  Downloading orjson-3.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m142.7/142.7 kB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (3.7.1)\n",
      "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (1.7.0)\n",
      "Collecting httpx<1,>=0.23.0 (from openai<2.0.0,>=1.10.0->langchain_openai)\n",
      "  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (1.3.1)\n",
      "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (4.66.2)\n",
      "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (4.11.0)\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (0.6.0)\n",
      "Requirement already satisfied: pydantic-core==2.18.2 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (2.18.2)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.3.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.7)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2.0.7)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2024.2.2)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (3.0.3)\n",
      "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken<1,>=0.5.2->langchain_openai) (2023.12.25)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai<2.0.0,>=1.10.0->langchain_openai) (1.2.1)\n",
      "Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai<2.0.0,>=1.10.0->langchain_openai)\n",
      "  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai<2.0.0,>=1.10.0->langchain_openai)\n",
      "  Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain)\n",
      "  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
      "Installing collected packages: packaging, orjson, mypy-extensions, jsonpointer, h11, typing-inspect, tiktoken, marshmallow, jsonpatch, httpcore, langsmith, httpx, dataclasses-json, openai, langchain-core, langchain-text-splitters, langchain_openai, langchain-community, langchain\n",
      "  Attempting uninstall: packaging\n",
      "    Found existing installation: packaging 24.0\n",
      "    Uninstalling packaging-24.0:\n",
      "      Successfully uninstalled packaging-24.0\n",
      "Successfully installed dataclasses-json-0.6.5 h11-0.14.0 httpcore-1.0.5 httpx-0.27.0 jsonpatch-1.33 jsonpointer-2.4 langchain-0.1.17 langchain-community-0.0.36 langchain-core-0.1.50 langchain-text-splitters-0.0.1 langchain_openai-0.1.5 langsmith-0.1.53 marshmallow-3.21.2 mypy-extensions-1.0.0 openai-1.25.1 orjson-3.10.2 packaging-23.2 tiktoken-0.6.0 typing-inspect-0.9.0\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "try:\n",
    "    from google.colab import drive, userdata\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "# OpenAI Secrets\n",
    "if COLAB:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
    "\n",
    "# Install needed libraries in CoLab\n",
    "if COLAB:\n",
    "    !pip install langchain langchain_openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pC9A-LaYhsta"
   },
   "source": [
    "# 5.2: Other Parsers (Comma List, JSON, Pandas, Datetime)\n",
    "\n",
    "\n",
    "In this section, we'll explore how LangChain offers versatile parsers capable of handling a variety of data formats, enhancing its functionality across numerous applications. Among these, it can seamlessly integrate with data in the form of Pandas dataframes, comma-separated lists, JSON structures, and datetime objects, among others. This capability ensures that LangChain can adapt to diverse data inputs, making it a powerful tool for data manipulation and analysis in different contexts. We will delve into some of these parsers and demonstrate their practical applications, highlighting how they can be utilized to streamline processes and extract meaningful insights from data.\n",
    "\n",
    "## Parse Comma Separated List Response\n",
    "\n",
    "We will begin with the CommaSeparatedListOutputParser parser, which can take the LLM output in a comma-separated list and extract it as a Python list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "-Vds7BAWITTH"
   },
   "outputs": [],
   "source": [
    "from langchain.output_parsers import CommaSeparatedListOutputParser\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "output_parser = CommaSeparatedListOutputParser()\n",
    "\n",
    "format_instructions = output_parser.get_format_instructions()\n",
    "prompt = PromptTemplate(\n",
    "    template=\"List ten {subject}.\\n{format_instructions}\",\n",
    "    input_variables=[\"subject\"],\n",
    "    partial_variables={\"format_instructions\": format_instructions},\n",
    ")\n",
    "\n",
    "MODEL = \"gpt-4o-mini\"\n",
    "TEMPERATURE = 0\n",
    "\n",
    "# Initialize the OpenAI LLM with your API key\n",
    "llm = ChatOpenAI(\n",
    "    model=MODEL,\n",
    "    temperature=TEMPERATURE,\n",
    "    n=1\n",
    ")\n",
    "\n",
    "chain = prompt | llm | output_parser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "vs9tTrVh78Sy"
   },
   "source": [
    "Extract a list of cities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "W7gvIB0kxORP",
    "outputId": "f6e4b2c4-0153-4e77-b9df-263f48b32b9a"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['New York City',\n",
       " 'Los Angeles',\n",
       " 'Chicago',\n",
       " 'Houston',\n",
       " 'Phoenix',\n",
       " 'Philadelphia',\n",
       " 'San Antonio',\n",
       " 'San Diego',\n",
       " 'Dallas',\n",
       " 'San Jose']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke({\"subject\": \"cities\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7z-cMptW7-wq"
   },
   "source": [
    "Extract a list of programming languages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ljZKh6PKxOIi",
    "outputId": "ab7548ff-3013-42df-9eda-f84b378be275"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Java',\n",
       " 'Python',\n",
       " 'C++',\n",
       " 'JavaScript',\n",
       " 'Ruby',\n",
       " 'Swift',\n",
       " 'PHP',\n",
       " 'C#',\n",
       " 'Go',\n",
       " 'Kotlin']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke({\"subject\": \"programming languages\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "mE2avWGaysc3"
   },
   "source": [
    "## Parse JSON Response\n",
    "\n",
    "We can format the output from the LLM into JSON. For this example, we will accept a sentence that we detect as English and then translate it into Spanish, French, and Chinese."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "8pvbqwz9xOAQ",
    "outputId": "603e3fd8-6991-444d-c88c-e48438b8a04b"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'detected': 'English',\n",
       " 'spanish': '¿Cuál es tu nombre?',\n",
       " 'french': 'Quel est ton nom?',\n",
       " 'chinese': '你叫什么名字？'}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.output_parsers import JsonOutputParser\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_core.pydantic_v1 import BaseModel, Field\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "# Define your desired data structure.\n",
    "class Translate(BaseModel):\n",
    "  detected: str = Field(description=\"the detected language of the input\")\n",
    "  spanish: str = Field(description=\"the input translated to Spanish\")\n",
    "  french: str = Field(description=\"the input translated to French\")\n",
    "  chinese: str = Field(description=\"the input translated to Chinese\")\n",
    "\n",
    "# And a query intented to prompt a language model to populate the data structure.\n",
    "input_text = \"What is your name?\"\n",
    "\n",
    "# Set up a parser + inject instructions into the prompt template.\n",
    "parser = JsonOutputParser(pydantic_object=Translate)\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    template=\"Answer the user query.\\n{format_instructions}\\n{input}\\n\",\n",
    "    input_variables=[\"input\"],\n",
    "    partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
    ")\n",
    "\n",
    "chain = prompt | llm | parser\n",
    "\n",
    "chain.invoke({\"input\": input_text})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dAY721mN14kp"
   },
   "source": [
    "## Query Pandas Dataframe\n",
    "\n",
    "Langchain's capabilities include parsing and analyzing Pandas dataframes using the PandasDataFrameOutputParser. This feature allows users to seamlessly integrate data stored in Pandas dataframes and use Langchain to query and extract insights from this data. By leveraging the PandasDataFrameOutputParser, Langchain can interpret the dataframe's structure, contents, and context, enabling it to provide accurate answers to user queries. This integration is particularly useful for data analysis, enabling more interactive and natural language-based exploration of data stored in Pandas dataframes.\n",
    "\n",
    "The following code reads and displays the first lines from the classic [iris dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "VMP2U3ZJ0hDP",
    "outputId": "5cc7b4c6-2ad6-4458-ec38-32a9f8095de7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   sepal_l  sepal_w  petal_l  petal_w      species\n",
      "0      5.1      3.5      1.4      0.2  Iris-setosa\n",
      "1      4.9      3.0      1.4      0.2  Iris-setosa\n",
      "2      4.7      3.2      1.3      0.2  Iris-setosa\n",
      "3      4.6      3.1      1.5      0.2  Iris-setosa\n",
      "4      5.0      3.6      1.4      0.2  Iris-setosa\n"
     ]
    }
   ],
   "source": [
    "import pprint\n",
    "from typing import Any, Dict\n",
    "\n",
    "import pandas as pd\n",
    "from langchain.output_parsers import PandasDataFrameOutputParser\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "# Load the iris dataset\n",
    "df = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/iris.csv\", na_values=[\"NA\", \"?\"]\n",
    ")\n",
    "\n",
    "print(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6errbnrFA1SC"
   },
   "source": [
    "Next we load the iris dataframe into a PandasDataFrameOutputParser class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "5h1Y5yhKxNzS"
   },
   "outputs": [],
   "source": [
    "parser = PandasDataFrameOutputParser(dataframe=df)\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
    "    input_variables=[\"query\"],\n",
    "    partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
    ")\n",
    "\n",
    "chain = prompt | llm | parser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5dPMQYCXA8r_"
   },
   "source": [
    "We query for the mean of one of the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "8dLvNJv62fIQ",
    "outputId": "92779e74-1eba-4c1b-c103-b0d5f55b85af"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'mean': 5.843333333333334}\n"
     ]
    }
   ],
   "source": [
    "query = \"Get the mean of the sepal_l column.\"\n",
    "parser_output = chain.invoke({\"query\": query})\n",
    "print(parser_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZhroER0yBG5G"
   },
   "source": [
    "We query for the sub of one of the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "lE-U8jO24oBY",
    "outputId": "82e8b8d4-4ffc-4597-8b11-de4825923f6e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'sum': 179.90000000000003}\n"
     ]
    }
   ],
   "source": [
    "query = \"Get the sum of petal_w column.\"\n",
    "parser_output = chain.invoke({\"query\": query})\n",
    "print(parser_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Y09-epHS5McO"
   },
   "source": [
    "## Datetime\n",
    "\n",
    "Langchain includes a feature known as the DatetimeOutputParser, which is specifically designed to parse datetime values from text. This capability allows it to recognize and interpret dates and times expressed in various formats, converting them into a standardized datetime format. This functionality is invaluable in applications involving scheduling, data analysis, or any context where accurate handling of dates and times is essential. By utilizing the DatetimeOutputParser, developers can streamline the processing of temporal data, ensuring that their applications can effectively manage and respond to time-related information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "NRuzzhXF5Oa1"
   },
   "outputs": [],
   "source": [
    "from langchain.output_parsers import DatetimeOutputParser\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_openai import OpenAI\n",
    "\n",
    "output_parser = DatetimeOutputParser()\n",
    "template = \"\"\"Answer the users question:\n",
    "\n",
    "{question}\n",
    "\n",
    "{format_instructions}\"\"\"\n",
    "prompt = PromptTemplate.from_template(\n",
    "    template,\n",
    "    partial_variables={\"format_instructions\": output_parser.get_format_instructions()},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "roqInbqGFdUb"
   },
   "source": [
    "We can display that prompt that we will use to obtain dates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "HecixPQl6kYk",
    "outputId": "e688b1a8-b675-4793-d8c9-e0ac2fe01142"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "input_variables=['question'] partial_variables={'format_instructions': \"Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.\\n\\nExamples: 1327-02-03T20:56:56.489822Z, 1058-10-02T02:08:23.921844Z, 682-08-19T01:21:02.307266Z\\n\\nReturn ONLY this string, no other words!\"} template='Answer the users question:\\n\\n{question}\\n\\n{format_instructions}'\n"
     ]
    }
   ],
   "source": [
    "print(prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kg5ZkOetFiPx"
   },
   "source": [
    "We create the chain that we will use to parse dates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "2w5gYty_5pKd"
   },
   "outputs": [],
   "source": [
    "chain = prompt | llm | output_parser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dD-gNvBHFooj"
   },
   "source": [
    "We will query for two dates, one real and the other fictional."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Ymu48Ggg518q",
    "outputId": "4aa29516-440c-4cd7-8ee7-ec18a8a4c4bb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1991-02-20 00:00:00\n"
     ]
    }
   ],
   "source": [
    "output = chain.invoke({\"question\": \"When was the Python language introduced?\"})\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "nOPWYpxr6As1",
    "outputId": "b9529904-92e8-4248-f686-a1208ef6be09"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2077-10-23 08:00:00\n"
     ]
    }
   ],
   "source": [
    "output = chain.invoke({\"question\": \"What is the date of the war in the video game Fallout?\"})\n",
    "print(output)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.11 (genai)",
   "language": "python",
   "name": "genai"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
